Algorithms for the Data Placement Problem
نویسنده
چکیده
We consider the data placement problem introduced by Baev & Rajaraman [1]. We have a set of caches, F , a set of data objects O, and a set of clients D. Each object s ∈ O has a length ls and a cache i ∈ F has capacity ui that limits the total length of data objects that may be stored in the cache. Further, each client j ∈ D has demand dj for a specific data object s(j) ∈ O and has to be assigned to a cache that stores that object. Storing an object s in cache i incurs a storage cost of fs i and assigning client j to cache i incurs an access cost of djls(j)cij proportional to the distance cij between i and j. The data placement problem seeks a placement of the data objects to caches that respects cache capacities and an assignment of clients to caches, so as to minimize the total storage and client access costs. More precisely, for each cache i in F we want to determine the set of objects O(i) ⊆ O it stores such that ∑ s∈O(i) ls ≤ ui and assign each client j to a cache i(j) that stores object s(j), i.e., s(j) ∈ O(i), and we want to minimize ∑ i∈F ∑ s∈O(i) f s i + ∑ j∈D djls(j)ci(j)j . We assume that the caches and clients are located in a common metric space, so the distances cij form a metric. The data placement problem is a generalization of the metric uncapacitated facility location (UFL) problem and is NP-hard even when all object lengths are equal. We give a 10-approximation algorithm for the problem when all objects have the same length, improving upon the approximation guarantee of 20.5 given by Baev & Rajaraman [1]. Our improvement comes from an improved rounding procedure for a natural LP relaxation of the problem also considered in [1]. As in [1], we can modify the algorithm get a bicriteria approximation guarantee when objects have different lengths; the placement returned has cost at most 10 times the optimal, but the total length of objects stored in a cache may exceed the cache capacity by the maximum object length. We also extend the algorithm to the k-median variant, where there is a bound ks imposed, for every object s, on the number of caches that may store object s. As stated in [1], it is not hard to show via a reduction from the PARTITION problem, that with arbitrary object lengths, it is NP-complete to even decide if there is a feasible solution and hence no approximation ratio is achievable in polynomial-time unless P=NP. [1] showed that the problem is MAXSNP-hard even if all objects have the same length and there are no storage costs, by reducing metric UFL to the problem, and gave a 20.5-approximation algorithm. In their model, client j has demand djs for every object s ∈ O and incurs a corresponding access cost; however this easily reduces to our model since for every object s, we can simply create a copy j(s) with demand djs (for object s).
منابع مشابه
New Ant Colony Algorithm Method based on Mutation for FPGA Placement Problem
Many real world problems can be modelled as an optimization problem. Evolutionary algorithms are used to solve these problems. Ant colony algorithm is a class of evolutionary algorithms that have been inspired of some specific ants looking for food in the nature. These ants leave trail pheromone on the ground to mark good ways that can be followed by other members of the group. Ant colony optim...
متن کاملCommunication-Aware Traffic Stream Optimization for Virtual Machine Placement in Cloud Datacenters with VL2 Topology
By pervasiveness of cloud computing, a colossal amount of applications from gigantic organizations increasingly tend to rely on cloud services. These demands caused a great number of applications in form of couple of virtual machines (VMs) requests to be executed on data centers’ servers. Some of applications are as big as not possible to be processed upon a single VM. Also, there exists severa...
متن کاملAdaptive Dynamic Data Placement Algorithm for Hadoop in Heterogeneous Environments
Hadoop MapReduce framework is an important distributed processing model for large-scale data intensive applications. The current Hadoop and the existing Hadoop distributed file system’s rack-aware data placement strategy in MapReduce in the homogeneous Hadoop cluster assume that each node in a cluster has the same computing capacity and a same workload is assigned to each node. Default Hadoop d...
متن کاملجانمایی دوربین در طراحی شبکههای فتوگرامتری صنعتی با استفاده از بهینهسازی تکاملی چندگانه
Nowadays, the subject of vision metrology network design is local enhancement of the existing network. In the other words, it has changed from first to third order design concept. To improve the network, locally, some new camera stations should be added to the network in drawback areas. The accuracy of weak points is enhanced by the new images, if the related vision constraints are satisfied si...
متن کاملPMU Placement Methods in Power Systems based on Evolutionary Algorithms and GPS Receiver
In this paper, optimal placement of Phasor Measurement Unit (PMU) using Global Positioning System (GPS) is discussed. Ant Colony Optimization (ACO), Simulated Annealing (SA), Particle Swarm Optimization (PSO) and Genetic Algorithm (GA) are used for this problem. Pheromone evaporation coefficient and the probability of moving from state x to state y by ant are introduced into the ACO. The modifi...
متن کاملCapacitor Placement in Distorted Distribution Network Subject to Wind and Load Uncertainty
Utilizing capacitor banks is very conventional in distribution network in order for local compensation of reactive power. This will be more important considering uncertainties including wind generation and loads uncertainty. Harmonics and non-linear loads are other challenges in power system which complicates the capacitor placement problem. Thus, uncertainty and network harmonics have been con...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2005